[ET-VK] Miscellaneous fixes #14803

pytorchbot · 2025-10-04T04:04:27Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #14732 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/335/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/335/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/335/orig
Differential Revision: D83703496
@diff-train-skip-merge

pytorch-bot · 2025-10-04T04:04:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14803

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 Cancelled Jobs

As of commit 1c32be5 with merge base e0dda90 ():

CANCELLED JOBS - The following jobs were cancelled. Please retry:

pull / unittest / macos / macos-job (gh)
##[error]The operation was canceled.
pull / unittest-editable / macos / macos-job (gh)
##[error]The operation was canceled.
Test CoreML Backend / test-coreml / test-backend-macos (coreml_static_int8, operators) / macos-job (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

GregoryComer · 2025-10-06T19:49:15Z

@SS-JIA It looks like there's a failure in preprocess. Could you take a look? Maybe there is a dependent cherry-pick or a faulty merge resolution?

  File "/pytorch/executorch/backends/vulkan/vulkan_preprocess.py", line 175, in preprocess
    ReplaceQDQPass(),
NameError: name 'ReplaceQDQPass' is not defined

Collecting fixes for various models/ops in this diff/PR. They have all been squashed into this single change to make it easier to cherry pick. # Fixes ## Wav2Letter Type: Output correctness failure This is caused by a bug in swiftshader, and not reproducible on any other platform. Specifically, the issue is in the softmax shader; the exact cause of the issue is unknown, but it is related to using shared memory within shaders. The workaround for this issue is to use separate shared memory arrays for the shared max and shared sum. ## ConvNeXT Type: Exception during runtime This is caused by an incompatible memory layout being used for mean2d. More technically, the packed dimension of the tensor cannot be one of the dims being reduced. The current operator registry system did not have a way to select valid tensor representations based on the actual arguments of an op. To fix, we have to introduce a mechanism for ops to specify valid representations once a node's arguments are known. Once the model is exported with supported memory layout, the model test passes. ## Inception_V3/ViT Type: Exception during runtime The root cause of this was an interaction betwen the fuse batch norm pass and how `vulkan_preprocess.py` was applying passes. Essentially, the fuse batch norm pass creates a new param node for the fused weight, but after the pass is applied `_copy_module` is used to copy the transformed graph back into the ExportedProgram. However, it seems that _copy_module lowercases the node names without updating the exported program's graph signature. Therefore, subsequent passes couldn't recognize the weight tensor of convolution tensors as a constant/parameter node. The solution was to migrate vulkan_preprocess.py to use the _transform() API instead of using _copy_module. ## DenseNet 161 (w/ dynamic shapes) Type: Output Mismatch Cause: the native_batch_norm op doesn't support dynamic shapes. However, the backend test runner doesn't set the correct compile option to filter ops without dynamic shape support. Differential Revision: [D83703496](https://our.internmc.facebook.com/intern/diff/D83703496/) [ghstack-poisoned] (cherry picked from commit 3f0896a)

SS-JIA · 2025-10-06T20:12:24Z

@GregoryComer yep, just pushed an update. Also added the nightly ciflow to the PR to be extra safe that the backend tests are passing.

pytorchbot requested a review from SS-JIA as a code owner October 4, 2025 04:04

This was referenced Oct 4, 2025

[v1.0.0] Release Tracker #14288

Open

[ET-VK] Miscellaneous fixes #14801

Merged

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 4, 2025

SS-JIA force-pushed the cherry-pick-14801-by-pytorch_bot_bot_ branch from 98d5969 to 1c32be5 Compare October 6, 2025 20:11

SS-JIA added the ciflow/nightly label Oct 6, 2025

GregoryComer approved these changes Oct 7, 2025

View reviewed changes

GregoryComer merged commit 54030da into release/1.0 Oct 7, 2025
242 of 254 checks passed

GregoryComer deleted the cherry-pick-14801-by-pytorch_bot_bot_ branch October 7, 2025 00:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK] Miscellaneous fixes #14803

[ET-VK] Miscellaneous fixes #14803

Uh oh!

pytorchbot commented Oct 4, 2025

Uh oh!

pytorch-bot bot commented Oct 4, 2025 •

edited

Loading

Uh oh!

GregoryComer commented Oct 6, 2025

Uh oh!

SS-JIA commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

[ET-VK] Miscellaneous fixes #14803

[ET-VK] Miscellaneous fixes #14803

Uh oh!

Conversation

pytorchbot commented Oct 4, 2025

Uh oh!

pytorch-bot bot commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14803

❌ 3 Cancelled Jobs

Uh oh!

GregoryComer commented Oct 6, 2025

Uh oh!

SS-JIA commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 4, 2025 •

edited

Loading